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of  simple  multi— stage  decision  processes 


where  the  intuitive  concept  of  maximizing  .  /  / 

expected  gain  over  expected  cost  is  valid^,  ^  ^ 


DECISION  MAKING  IN  THE  PACE  OP  UNCERTAINTY— I 
(Uncertain  Outcome) 

Richard  Bellman 

^1.  Introduction 

In  logistics  there  Is  a  large  class  of  situations  In  which 
we  are  to  use  a  given  resource  repeatedly  until  It  Is  exhausted. 
If  at  each  stage  we  have  several  different  ways  of  utilizing 
this  resource.  It  Is  a  problem  of  Importance  to  determine  the 
procedure  which  maximizes  the  over-all  value  of  the  resource. 

In  a  previous  paper  [jQ  we  have  given  some  applications  of  the 
theory  of  dynamic  progremning  to  problems  of  this  general  clajs. 

In  this  paper  we  wish  to  consider  some  representative 
problems  where  the  outcome  of  any  stage  Is  not  completely  deter¬ 
mined.  ¥e  shall  show  that  In  some  cases  optimal  policies  of 
quite  simple  and  Intuitive  nature  exist. 

In  subsequent  papers  we  shall  discuss  some  cases  of  uncer¬ 
tain  purpose,  which  is  to  say,  processes  In  which  we  do  not  com¬ 
pletely  know  the  form  of  the  "payoff”  or  criterion  function. 

Dlsousalon 

A  problem  of  recurrent  type  Is  the  following:  We  have  a 
certain  quantity  of  resource  which  we  can  use  In  various  fash¬ 
ions.  Any  particular  application  yields  a  certain  distribution 
of  returns  balanced  against  a  certain  distribution  of  costs. 

How  do  we  proceed  to  use  the  resource  until  depleted  so  as  to 
maximize  the  average  retumt 
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A  naive  approach  to  the  problem  runs  as  follows t  As  a 
result  of  any  particular  decision  we  obtain  a  certain  expected 
gain  and  suffer  a  certain  expected  cost.  A  reasonable  procedure 
for  maximizing  over-all  gain  would  then  seem  to  be  one  which 
yields  the  maximum  return  per  unit  cost,  i.e.,  one  which  maxi¬ 
mizes  the  ratio 

R  ■  immediate  expected  gain  #2 

”  immediate  expecied  cost  ' 

This  prescription  for  an  optimal  policy  has  many  desirable 
features.  It  is  simple,  intuitive,  requires  only  an  estima¬ 
tion  of  average  values  rather  than  detailed  knowledge  of  the 
distribution  of  events— and  is  occasionally  correct. 

We  shall  discuss  below  a  number  of  situations  in  which  it 
Is  approximately  correct. 

The  purpose  of  the  simple  models  discussed  below  is  to 
bolster  our  intuition,  which  is,  after  all,  only  a  coiiblnation 
of  the  results  of  theory  and  practice.  As  far  as  the  multi¬ 
stage  processes  of  realistic  type  are  concerned,  existent  theory 
is  meager.  Consequently,  it  is  important  to  build  up  a  backlog 
of  as  many  mathematical  models  as  possible  with  known  aolutiona 
in  order  to  obtain  clues  to  the  realistic  problems  which  uiwally 
defy  precise  analysis. 
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§3.  A  Multi-stage  Allocation  Problem  of  Determiniatic  Type 

Let  U8  begin  with  a  almpler  process  which  is  deterministic. 
We  have  an  initial  resource  x  which  may  be^  utilized  in  a  number 
of  ways.  If  y  is  a  parameter  specifying  a  particular  use,  we 
let  R(x,y)  be  the  immediate  return,  and  D(x,y)  the  cost  in 
resources.  How  do  we  proceed  to  utilize  this  resource  so  as  to 
maximize  the  total  return? 


We  set  (see  [ij,  [2],  [3) 


f(x)  «  total  return  from  an  initial  resource  x, 
using  an  optimal  allocation  policy 


(3.1) 


Then,  as  discussed  in  [Ij,  [^] ,  this  function  satisfies  the 
functional  equation 

f(x)  -  Max  CR(x,y)  *►  f(x  -  D(x,y))3  (3.2) 

y 

Let  us  now  make  the  fundamental  assumption  that  D(x,y)  is 
small  compared  to  x  for  any  y,  i.e.  0  <  D(x,y)  «  x.  Proceed¬ 
ing  formally,  we  may  write 

f(x)  iNax  ♦  f(«)  -D(x,y)f»(x)  ♦  (3.3) 

y 

wliich  yields  the  approxiawte  equation 

0  -  Max  Ci*<**y)  -  i>(Ji»y)f*(x)J 

y 


(3.4) 
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This  means  that  for  one  y,  say  y,  we  have 

0  •  R(x,y)  -D(x,y)f'(x)  (3.5) 

and  for  all  others  we  have 

0  iR(x,y)  -D(x,y)f'(x)  (3.$) 

Consequently,  within  the  error  contained  in  using  (3*^) 
in  place  of  (3*3)#  we  have 

f .  (X)  .  Hax  (3.7) 

which  is  equivalent  to  the  statement  that  at  each  stage  we  use 
our  resource  in  accord  with  the  prescription  of  (1.1). 


Let  us  now  consider  the  process  above  where  we  have  a  dis-> 
tribution  of  returns  and  allocations.  Any  particular  utilisa¬ 
tion  yields  a  set  of  returns,  z,  characterized  by  a  distribution 
function  dR(y,z,s),  and  a  set  of  costs,  w,  characterized  by  a 
distribution  function  dO(y,w,x).  We  now  wish  to  ■axinise  the 
expected  total  return  from  an  initial  resource  x.  Denote  this 
quantity  by  f(x).  Then,  as  above,  f(x)  satisfies  the  equation 

00  00 

f(x)  •  Max  C  zdR(y,z,x)  ♦  f(x  -  J*  «UD(y,w,x))  ]]  (4.1) 

y  0  o 


P-568 


Assuming,  as  before,  that 

00 

Expected  Cost  «  S  vfdD(y,w,x)  «  X 

o 

we  obtain  as  an  approximation  to  (4.1),  the  equation 

00  00 

0  -  Max  zd(R,y,z,x)  -  f»(x)  J  wdD(y,w,x)3  (4.3) 

y  o  0 


which  is  precisely  the  prescription  of  (l.l). 


(4.4) 


#5.  Conclusion 

Ve  have  discussed  above  two  representative  examples  of 
multl-etage  allocation  processes^  in  each  of  which  the  prescrip¬ 
tion  of  (l.l)  furnished  an  ai^proxlmatlon  to  the  optimal  policy. 

In  those  situations  where  the  change  in  resources  may  be 
large  compared  to  initial  resources,  the  above  analysis  is  not 
as  useful.  Nonetheless,  even  in  these  cases,  this  prescription 
fltmishea  a  useful  first  approximation  which  may  be  improved 
by  successive  approximations  based  upon  the  functional  equa¬ 
tions  (see  [1],  [2]»  D])* 


P-568 

-6- 


BIBLIOGRAPHY 


1.  Bellman,  R.,  "On  Some  Applications  of  the  Theory  of  Dynamic 

Programming  to  Logistics,"  Naval  Research 
Logistics  Quarterly  (to  appear) . 

2.  -  .  "some  Problems  in  the  Theory  of  Dynamic  Pro¬ 

gramming,"  Econometrlca,  Vol.  22,  No.  1 
(January  1 95^ ) »  PP •  37— A8. 

,  ‘‘Some  Apjjj-iucxoions  of  the  Theory  of  Dynamic 
Programming — A  Review,"  Journal  of  the  Opera— 
tlons  Research  Society  of  America,  Vol.  2, 

Uo.  5  (August  icp),  pp.  y7i^38T 


b 


1  . 
U 


